On Efficient Management of XML Documents

نویسنده

Hongjun Lu

چکیده

XML has become a de facto standard for data representation and exchange on the World-Wide-Web. Unlike HTML tags that are mainly used to describe presentations, tags in XML capture some semantics, especially when domain-specific common DTDs are used when authoring XML documents. Since industries are indeed enthusiastic about XML, and more and more XML documents have been generated, we have to deal with the issues related to efficient management of XML documents. Storing XML documents. A number of approaches have been proposed to store XML documents. Those approaches can be categorized along two dimensions. One dimension is how an XML document is modelled. XML documents can be managed as text files, or by a DBMS, or by a native XML engine. When managed as text files, they are viewed as character strings. When DBMS is used, they are transformed to conform a specific data model, e.g., the relational model. Most native XML engines use trees to model XML documents since elements in an XML document are ordered and strictly nested. Orthogonal to this dimension is whether DTD is used in the storage model. For example, when XML data is stored in a relational system, the relational schema can be generated either using or not using the type information of elements in DTD. When the schema is generated based on the DTDs, XML documents with different DTDs will have different schemas, hence the schema will be document dependent. On the other hand, since any XML document can be modelled as an ordered tree, a relational schema that is able to describe the tree structure, and the position of elements in such a structure is sufficient. Using this approach, no DTD information is required, and all XML documents will have the same relational schema. That is, such schema is document independent. Recently, we conducted a benchmarking study to investigate the comparative performance of various schema mapping and storage methods implemented in three experimental XML database systems: VXMLR (Zhou, Lu, Zheng, Liang, Zhang, Ju & Tian 2001), XParent(Jiang, Lu, Wang & Yu 2002), and XBase (Lu, Wang, Yu, Bao, Lv & Yu 2002). VLXMR and XParent were built on top of RDBMS, and XBase is a native XML engine. Based on the categorization mentioned, the storage models compared can be summarized as in Figure 1. The benchmarks used in the study are the data centric XMark Benchmark (Schmidt, Waas, Kersten, Florescu, Manolescu, Carey & Busse 2001) and the document centric XMach benchmark (Böhme & Rahm 2001). We will

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

Hybrid XML data model architecture for efficient document management

XML has been known as a document standard in representation and exchange of data on the Internet, and is also used as a standard language for the search and reuse of scattered documents on the Internet. The issues related to XML are how to model data on effective and efficient management of semi-structured data and how to actually store the modeled data when implementing a XML contents manageme...

متن کامل

Storing and Querying Multiversion XML Documents using Durable Node Numbers

Managing multiple versions of XML documents represents an important problem for many traditional applications, such as software configuration control, as well as new ones, such as link permanence of web documents. Research on managing multiversion XML documents seeks to provide efficient and robust techniques for storing, retrieving and querying such documents. In this paper, we present a novel...

متن کامل

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

Hcmx: an Efficient Hybrid Clustering Approach for Multi-version Xml Documents

In order to retrieve useful information from large number of growing XML documents on the web, effective management of XML document is essential. One solution is to cluster XML documents to find knowledge that promote effective information management and maintenance. But in the real world XML documents are dynamic in nature. In contrast to static XML documents, changes from one version of XML d...

متن کامل

Utilizing XML Clustering for Efficient XML Data Management on P2P Networks

Peer-to-Peer (P2P) data integration combines the P2P infrastructure with traditional scheme-based data integration techniques. Some of the primary problems in this research area are the techniques to be used for querying, indexing and distributing documents among peers in a network especially when document files are in XML format. In order to handle this problem we describe an XML P2P system th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

On Efficient Management of XML Documents

نویسنده

چکیده

منابع مشابه

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Hybrid XML data model architecture for efficient document management

Storing and Querying Multiversion XML Documents using Durable Node Numbers

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Hcmx: an Efficient Hybrid Clustering Approach for Multi-version Xml Documents

Utilizing XML Clustering for Efficient XML Data Management on P2P Networks

عنوان ژورنال:

اشتراک گذاری